1b), and if it’s clear that the fraction of positive outcomes isn’t leveling off at

or

for

very large or very small X values, then logistic regression is not the correct modeling approach.

The H-L test described earlier under the section “Assessing the adequacy of the model” provides a

statistical test to determine if your data qualify for logistic regression. Also, in Chapter 19, we

describe a more generalized logistic model that contains other parameters for the upper and lower

leveling-off values.

Watch out for collinearity and disappearing significance: When you are doing any kind of

regression and two or more predictor variables are strongly related with each other, you can be

plagued with problems of collinearity. We describe this problem in Chapter 17, and potential

modeling solutions in Chapter 20.

Check for inadvertent reverse-coding of the outcome variable: The outcome variable should

always be coded as 1 for a yes outcome and 0 for a no outcome (refer to Table 18-1 for an

example). If the variable in the data set is coded using characters, you should recode an outcome

variable using the 0/1 coding. It is important you do the coding yourself, and do not leave it to an

automated function in the program, because it may inadvertently reverse the coding so that 1 = no

and 0 = yes. This error of reversal won’t affect any p values, but it will cause all your ORs and

their CIs to be the reciprocals of what they would have been, meaning they will refer to the odds of

no rather than the odds of yes.

Don’t misinterpret odds ratios for categorical predicators: Categorical predictors should be

coded numerically as we describe in Chapter 8. It is important to ensure that proper indicator

variable coding is used, and these variables are introduced properly in the model, as described in

Chapter 17.

Also, be careful not to misinterpret odds ratios for numerical predictors, and be mindful of the

complete separation problem, as described in the following sections.

Don’t misinterpret odds ratios for numerical predictors

The OR always represents the factor by which the odds of getting the outcome event increases

when the predictor increases by exactly one unit of measure, whatever that unit may be.

Sometimes you may want to express the OR in more convenient units than what the data was

recorded in. For the example in Table 18-1, the OR for dose as a predictor of death is 1.0115 per

REM. This isn’t too meaningful because one REM is a very small increment of radiation. By

raising 1.0115 to the 100th power, you get the equivalent OR of 3.1375 per 100 REMs, and you

can express this as, “Every additional 100 REMs of radiation more than triples the odds of

dying.”

The value of a regression coefficient depends on the units in which the corresponding predictor

variable is expressed. So the coefficient of a height variable expressed in meters is 100 times larger

than the coefficient of height expressed in centimeters. In logistic regression, ORs are obtained by

exponentiating the coefficients, so switching from centimeters to meters corresponds to raising the OR

(and its confidence limits) to the 100th power.